An analytical approach to similarity measure selection for self-training
نویسندگان
چکیده
We present a framework for investigating properties of similarity measures as a criterion for selecting the best-suited measure for a specific task, in this paper: corpus selection for self-training. We focus on the squared Pearson’s correlation coefficient as the property to rank similarity measures. Selftraining is an unsupervised domain adaptation technique, in which three corpora are involved. Especially, the choice of the unlabeled corpus can be important and we show that similarity measures can be helpful when selecting an unlabeled corpus. In addition, we found that the correlation coefficient between similarity and accuracy of a similarity measure can be used to select the most suitable similarity measure, but other properties of similarity measures do also play a role.
منابع مشابه
Translation Invariant Approach for Measuring Similarity of Signals
In many signal processing applications, an appropriate measure to compare two signals plays a fundamental role in both implementing the algorithm and evaluating its performance. Several techniques have been introduced in literature as similarity measures. However, the existing measures are often either impractical for some applications or they have unsatisfactory results in some other applicati...
متن کاملTranslation Invariant Approach for Measuring Similarity of Signals
In many signal processing applications, an appropriate measure to compare two signals plays a fundamental role in both implementing the algorithm and evaluating its performance. Several techniques have been introduced in literature as similarity measures. However, the existing measures are often either impractical for some applications or they have unsatisfactory results in some other applicati...
متن کاملINFORMATION MEASURES BASED TOPSIS METHOD FOR MULTICRITERIA DECISION MAKING PROBLEM IN INTUITIONISTIC FUZZY ENVIRONMENT
In the fuzzy set theory, information measures play a paramount role in several areas such as decision making, pattern recognition etc. In this paper, similarity measure based on cosine function and entropy measures based on logarithmic function for IFSs are proposed. Comparisons of proposed similarity and entropy measures with the existing ones are listed. Numerical results limpidly betoken th...
متن کاملData point selection for self-training
Problems for parsing morphologically rich languages are, amongst others, caused by the higher variability in structure due to less rigid word order constraints and by the higher number of different lexical forms. Both properties can result in sparse data problems for statistical parsing. We present a simple approach for addressing these issues. Our approach makes use of self-training on instanc...
متن کاملDetermining appropriate weight for criteria in multi criteria group decision making problems using an Lp model and similarity measure
Decision matrix in group decision making problems depends on a lot of criteria. It is essential to know the necessity ofweight or coefficient of each criterion. Accurate and precise selection of weight will help to achieve the intended goal.The aim of this article is to introduce a linear programming model for recognizing the importance of each criterion inmulti criteria group decision making w...
متن کامل